Lab 1: Intro to R and data analysis

Caution

Website construction in progress…

Day 1

Topics

  • Introduction to R and R-studio
    • Why R?
  • Understanding different types of variables
  • Handling R objects: vectors, matrix, data.frame
  • Descriptive statistics
    • Measures of central tendency, measures of variability (or spread), and frequency distribution
  • Visual data exploration
    • {ggplot2}
  • Foundations of inference

Lab code

Here you will find the solved problems addressed in Lab 1

  • as .qmd file (commented solutions)
  • as .R file

Lab datasets

Below are the datasets used in the Practice session:

  • .csv file
  • local subfolder

This workshop showcases introductory bio statistics concepts using the open source (and free!) programming language . Each session of the workshop features exercises that will help you learn by doing. Therefore, it is recommended that you pre-install the following on your machine.

Below is quick step by step process that can help you get started.

Install

R is available for free for Windows , GNU/Linux , and macOS .

  • To install R, you can go to this link. The latest available release is R 4.3.3 “Angel Food Cake” released on 2024-02/29, but any (fairly recent) version will do.

If you have previously installed R on your machine, you can check which version you are running by executing this command in R:

# From the R console
base::R.version.string
    # (This is the version on my own machine)
    # [1] "R version 4.2.2 (2022-10-31)"

…or by executing this command in your CLI (Command Line Interface):

# From Terminal/Powershell/bash
R --version

Install RStudio IDE

While not strictly required, it is highly recommended that you also install RStudio to facilitate your work. RStudio Desktop is an Integrated Development Editor (IDE), basically a graphical interface wrapping and interfacing R (which needs to be installed first).

R, which is a command line driven program, can be executed via its native interface (R GUI), as well as from many other code editors, like VS Code, Sublime Text, Jupyter Notebook, etc. RStudio remains the most widely used by beginners and advanced programmers alike, because of its intuitive and integrated interface.

  • To install RStudio you can go to this link. The free-version contains everything you need.

Install R packages from the CRAN

An R package is a shareable bundle of functions. Besides the basic built-in functions already contianed in the program (i.e. the base package), many useful R functions come in free libraries of code (or packages) written by R’s user community.

  • CRAN - the Comprehensive R Archive Network - is the general package repository for R: https://cran.r-project.org/.

  • Bioconductor -

  • Github -

Installing and using R packages

#? https://r-training.pages.uni.lu/biostat1/install_tutorial.html

Let’s take for example the R package corr, a package for graphically exploring correlations. To install it for the first time, open an R session and execute:

# Installing (ONLY the 1st time)
utils::install.packages('corrplot')

Here you are actually using a function (install.packages) of a pre-installed package (utils) using the syntax packagename::function_name

Once you have installed a package, at every subsequent R session, you will only need to load it, like so:

# Loading a package (at every session) 
base::library ("corrplot")

# ... or
library (corrplot)

To inquire about a package and/or its functions, you can again write in your console ?package_name or ?function_name and RStudio will open up a help page in the dedicated pane of Rstudio:

# Opening Help page on package/function
?corrplot

You can also install and update packages using the “Packages” tab on the lower righ pane of RStudio

Screenshot Install/Update pckgs from RStudio

Managing files and projects

In any analytical endeavor it is very likely that you will handle a collection of files (likely organized in folders, such as input_data, output_data, R_scripts, paper, etc.). R provides a fantastic tool for organizing all the files pertaining to a project called “R project”

Creating an R Project

Creating an R Project will keep all the files associated with a project (including invisible ones!) organized together – input data, R scripts, analytical results, figures. Besides being common practice, this has the advantage of implicitly setting the “working directory”, which is incredibly important when you need to load or output files, specifying their file path.

Defining (reproducible) file paths

It is never good practice to “hard code” the complete file path of a file: most likely this will break your code as soon as you (or someone else) need to run it on a different machine, let alone within a different OS.

# [NOT REPRODUCIBLE] hard coding your file path  --------------------------
library(readr)
# File path on Mac:
dataset <- read_csv("/Users/testuser/R4biostats/input_data/dataset.csv")
# Same file path on Windows:
dataset <- read_csv("C:\Users\testuser\R4biostats\input_data\dataset.csv")

This is where the fantastic here package comes in to help, as it will define file paths in a “reproducible manner” as long as you have created an R Project.

# [REPRODUCIBLE!!] Expressing your file path in a system agnostic way! --------
library(here)
library(readr)

# Check where is my Working Directory?
here::here()
    # [1] "/Users/testuser/R4biostats"

# Then define file path as ("subfolder_name", "file_name")
# No "\" or "/" needed!
dataset <- read_csv(here("input_data", "dataset.csv"))

The here package uses the top-level directory of a project (where you have placed your proj_name.Rproj) as the reference to easily, and portably, build paths to files.

#? Objects and functions

#? R packages that will be required for the workshop

To install an R package, open an R session execute:

# Installing (only the 1st time)
pkg_list <- c("tidyverse", "quarto", "rmarkdown", "palmerpenguins")
install.packages(pkg_list)

# Loading a package (at every session) 
library ("tidyverse")